Skip to content

feat(qrm, sysadvisor): Periodically (every 300s) clean up excessive (>2000) dying mem cgroups#1069

Open
Zera-Algorithm wants to merge 5 commits intokubewharf:mainfrom
Zera-Algorithm:main
Open

feat(qrm, sysadvisor): Periodically (every 300s) clean up excessive (>2000) dying mem cgroups#1069
Zera-Algorithm wants to merge 5 commits intokubewharf:mainfrom
Zera-Algorithm:main

Conversation

@Zera-Algorithm
Copy link

What type of PR is this?

feat(qrm, sysadvisor): Periodically (every 300s) clean up excessive (>2000) dying mem cgroups.

BREAKING CHANGES:

  1. Add a scheduled task in SysAdvisor to trigger dying memcg cleanup every 300s.
  2. Add a new task in the QRM plugin to trigger dying memcg cleanup via memory reclamation using memory.reclaim.

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

@codecov
Copy link

codecov bot commented Feb 2, 2026

Codecov Report

❌ Patch coverage is 80.29197% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.08%. Comparing base (9d6f0ff) to head (c79ad03).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
pkg/util/cgroup/manager/cgroup.go 76.31% 6 Missing and 3 partials ⚠️
...ins/memory/dynamicpolicy/policy_advisor_handler.go 77.77% 4 Missing and 4 partials ⚠️
pkg/util/cgroup/manager/v2/fs_linux.go 86.95% 2 Missing and 1 partial ⚠️
...saware/resource/memory/plugin/memory_offloading.go 92.30% 1 Missing and 1 partial ⚠️
pkg/util/cgroup/manager/fake_manager.go 0.00% 2 Missing ⚠️
pkg/util/cgroup/manager/v1/fs_linux.go 0.00% 2 Missing ⚠️
...pp/options/sysadvisor/qosaware/qos_aware_plugin.go 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1069      +/-   ##
==========================================
+ Coverage   60.83%   61.08%   +0.25%     
==========================================
  Files         739      739              
  Lines       69769    69912     +143     
==========================================
+ Hits        42443    42705     +262     
+ Misses      22567    22431     -136     
- Partials     4759     4776      +17     
Flag Coverage Δ
unittest 61.08% <80.29%> (+0.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JulyWindK JulyWindK added the workflow/need-review review: test succeeded, need to review label Mar 16, 2026
type QoSAwarePluginOptions struct {
SyncPeriod time.Duration
SyncPeriod time.Duration
EnableDyingMemcgReclaim bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to put this configuration in MemoryAdvisorOptions?

@JulyWindK
Copy link
Collaborator

The commit needs to be squashed.

…>2000) dying mem cgroups.

BREAKING CHANGES:
1. Add a scheduled task in SysAdvisor to trigger dying memcg cleanup every 300s.
2. Add a new task in the QRM plugin to trigger dying memcg cleanup via memory reclamation using memory.reclaim.
…ommon `manager.go` interface, and using `MemoryOffloadingWithAbsolutePath` instead of self-written `invokeMemoryReclaim`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

workflow/need-review review: test succeeded, need to review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants